Digital signal compression encoding with improved quantisation

ABSTRACT

In compression encoding of a digital signal, such as MPEG2, transform coefficients are quantised with the lower bound of each interval being controlled by a parameter lambda. In the MPEG2 reference coder, for example, lambda=0.75. Because the quantised coefficients are variable length coded, improved quality or reduced bit rates can be achieved by controlling lambda so as to vary dynamically the bound of each interval with respect to the associated representation level. The parameter lambda can vary with coefficient amplitude, with frequency, or with quantisation step size. In a transcoding operation, lambda can also vary with parameters in the initial coding operation.

FIELD OF THE INVENTION

This invention relates to the compression of digital video, audio orother signals.

BACKGROUND OF THE INVENTION

Compression encoding generally involves a number of separate techniques.These will usually include a transformation, such as the block-baseddiscrete cosine transform (DCT) of MPEG-2; an optional prediction step;a quantisation step and variable length coding. This invention isparticularly concerned in this context with quantisation.

The quantisation step maps a range of original amplitudes onto the samerepresentation level. The quantisation process is thereforeirreversible. MPEG-2, (in common with other compression standards suchas MPEG-1, JPEG, CCITT/ITU-T Rec.H.261 and ITU-T Rec.H.263) definesrepresentation levels and leaves undefined the manner in which theoriginal amplitudes are mapped onto a given set of representationlevels.

In general terms, a quantizer assigns to an input value, which may becontinuous or may previously have been subjected to a quantisationprocess, a code usually selected from quantization levels immediatelyabove and immediately below the input value. The error in such aquantization will generally be minimised if the quantization levelclosest to the input value is selected. In a compression system, it isfurther necessary to consider the efficiency with which respectivequantization levels may be coded. In variable length coding, thequantization levels which are employed most frequently are assigned theshortest codes.

Typically, the zero level has the shortest code. A decision to assign ahigher quantization level, on the basis that it is the closest, ratherthan a lower level (and especially the zero level) will thereforedecrease coding efficiency. In MPEG2, the overall bit rate of thecompressed signal is maintained beneath a pre-determined limit byincreasing the separation of quantization levels in response to atendency toward higher bit rate. Repeated decisions to assignquantization levels on the basis of which is closest, may through codinginefficiency thus lead to a coarser quantization process.

The behaviour of a quantizer in this respect may be characterisedthrough a parameter λ which is arithmetically combined with the inputvalue, with one value of λ (typically λ=1) representing the selection ofthe closest quantization level or “rounding”. A different value of λ(typically λ=0) will in contrast represent the automatic choice of thelower of the two nearest quantization levels, or “truncating”. In theMPEG2 reference coder, an attempt is made to compromise between thenominal reduction in error which is the attribute of rounding and thetendency toward bit rate efficiency which is associated with truncating,by setting a standard value for λ of λ=0.75.

Whilst particular attention has here been paid to MPEG2 coding, similarconsiderations apply to other methods of compression encoding of adigital signal, which including the steps of conducting a transformationprocess to generate values and quantising the values throughpartitioning the amplitude range of a value into a set of a adjacentintervals, whereby each interval is mapped onto a respective one of aset of representation levels which are to be variable length coded, suchthat a bound of each interval is controlled by a parameter λ. Thetransformation process may take a large variety of forms, includingblock-based transforms such as the DCT of MPEG2, and sub-band coding.

SUMMARY OF THE INVENTION

It is an object of one aspect of the present invention to provide animprovement in such a method which enables higher quality to be achievedat a given bitrate or a reduction in bitrate for a given level ofquality.

Accordingly, the present invention is in one aspect characterised inthat λ is controlled so as to vary dynamically the bound of eachinterval with respect to the associated representation level.

Suitably, wherein each value is arithmetically combined with λ.

Advantageously, λ is:

a function of the quantity represented by the value;

where the transformation is a DCT, a function of horizontal and verticalfrequency;

a function of the quantisation step size; or

a function of the amplitude of the value.

In a particular form of the present invention, the digital signal to beencoded has been subjected to previous encoding and decoding processesand λ is controlled as a function of a parameter in said previousencoding and decoding processes.

In a further aspect, the present invention consists in a (q, λ)quantiser operating on a set of transform coefficients x_(k)representative of respective frequency indices f_(k) in which λ isdynamically controlled in dependence upon the values of x_(k) and f_(k).

Advantageously, λ is dynamically controlled to minimise a cost functionD+μH where D is a measure of the distortion introduced by thequantisation in the uncompressed domain and H is a measure of compressedbit rate.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described by way of example with reference tothe accompanying drawings, in which:

FIG. 1 is a diagram illustrating the relationships betweenrepresentation levels, decision levels and the value of λ;

FIG. 2 is a block diagram representation of the quantization process inthe MPEG2 reference coder;

FIG. 3 is a block diagram representation of a simplified and improvedquantization process;

FIG. 4 is a block diagram representation of the core elements of FIG. 3;

FIG. 5 is a block diagram representation of a quantization processaccording to one aspect of the present invention; and

FIG. 6 is a block diagram representation of a quantization processaccording to a further aspect of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the specifically mentioned compression standards, the originalamplitude x results from a discrete cosine transform (DCT) and is thusrelated to a horizontal frequency index f_(hor) and a vertical frequencyindex f_(ver). Whilst this approach is taken as an example in whatfollows, the invention is not restricted in this regard.

In general, a quantiser describes a mapping from an original amplitude xof frequencies f_(hor) and f_(ver) onto an amplitude y=Q(x). The mappingperformed by the quantiser is fully determined by the set ofrepresentation levels {r_(l)} and by the corresponding decision levels{d_(l)} as illustrated in FIG. 1. All original amplitudes in the ranged_(l)≦x<d_((l+1)) are mapped onto the same representation levely=Q(x)=r_(l). As can be seen from FIG. 1, consecutive decision levelsare related by the quantisation step size q: and for a givenrepresentation level r_(l), the corresponding decision level is

d _(l+1) =d _(l) +q  (1)

calculated as: $\begin{matrix}{d_{l} = {r_{l} - {\frac{\lambda}{2} \cdot q}}} & \text{(2)}\end{matrix}$

The quantiser is fully specified by the quantisation step-size q and theparameter λ for a given set of representation levels {r_(l)}. Therefore,a quantiser that complies with equations (1) and (2) can be referred toas a (q,λ) quantiser.

Currently proposed quantisers, as described in the reference coders forthe H.261, H.263, MPEG-1 and MPEG-2 standards, all apply a special typeof (q, λ) quantiser in that a fixed value of λ is used: for exampleλ=0.75 in the MPEG-2 reference coder or λ=1.0 in the MPEG-1 referencecoder for quantisation of intra-DCT-coefficients.

According to one aspect of this invention, λ is not constant but is afunction that depends on the horizontal frequency index f_(hor), thevertical frequency index f_(ver), the quantisation step-size q and theamplitude x:

λ=λ(f_(hor), f_(ver), q, x)  (3)

Examples of ways in which the function may usefully be derived toimprove picture quality in video compression at a given bit-rate—or toreduce the required bit-rate at a given picture quality—will be set outbelow.

The invention extends also to the case of transcoding when a firstgeneration amplitude y₁=Q₁(x) is mapped onto a second generationamplitude y₂=Q₂(y₁) to further reduce the bit-rate from the first to thesecond generation without having access to the original amplitude x. Inthis case, the first generation quantiser Q₁ and the second generationquantiser Q₂ are described as a (q₁, λ₁)-type quantiser and a (q₂,λ₂)-type quantiser, respectively. The second generation λ₂ value isdescribed as a function:

λ₂=λ₂(f_(hor), f_(ver), q₁, λ₁, q₂, λ_(2,ref), y₁)  (4)

The parameter λ_(2,ref) that appears in Eqn. (4) is applied in areference (q₂, λ_(2,ref))-type quantiser. This reference quantiserbypasses the first generation and directly maps an original amplitude xonto a second generation reference amplitude y_(2,ref)=Q_(2,ref)(x).

The functional relationship of Eqn. (4) can be used to minimise theerror (y₂−y_(2,ref)) or the error (y₂−x). In the first case, theresulting second generation quantiser may be called a maximuma-posteriori (MAP) quantiser. In the second case, the resulting secondgeneration quantiser may be called a mean squared error (MSE) quantiser.Examples of the second generation (q₂, λ_(2,MAP))-type and (q₂,λ_(2,MSE))-type quantisers are given below. For a more detailedexplanation of the theoretical background, reference is directed to thepaper “Transcoding of MPEG-2 intra frames”—Oliver Werner—IEEETransactions on Image Processing 1998, which will for ease of referencebe referred to hereafter as “the Paper”. A copy of the Paper is appendedto British patent application No. 9703831 from which the presentapplication claims priority.

The present invention refers specifically to quantization of ‘intra’ DCTcoefficients in MPEG2 video coding but can be applied to non-intracoefficients, to other video compression schemes and to compression ofsignals other than video. In MPEG2, the prior art is provided by what isknown as Test Model 5 (TM5). The quantization scheme of TM5 for positiveintra coefficients is illustrated in FIG. 2.

In order to simplify the description, the above diagram will be replacedby FIG. 3, which illustrates essentially the same quantizer except forsmall values of q, where it corrects an anomaly as described in thePaper.

In this quantizer, the incoming coefficients are first divided byquantizer weighting matrix values, W, which depend on the coefficientfrequency but which are fixed across the picture, and then by aquantizer scale value q which can vary from one macroblock to the nextbut which is the same for all coefficient frequencies. Prior to theadder, the equivalent inverse quantizer reconstruction levels are simplythe integers 0, 1, 2 . . . . A fixed number λ/2, is then added to thevalue and the result truncated. The significance of λ is that a value of0 makes the quantizer (of the value input to the adder) a simpletruncation, while a value of 1 makes it a rounding operation. In TM5,the value of λ is fixed at 0.75.

Attention will hereafter be focused on the operation of the ‘core’quantizer shown in FIG. 4.

In a class of MPEG-2 compatible quantisers for, intra frame coding,non-negative original dct-coefficients x (or the same coefficients afterdivision by weighting matrix values W) are mapped onto therepresentation levels as: $\begin{matrix}{y = {{Q(x)} = {\left\lfloor {\frac{x}{q} + \frac{\lambda}{2}} \right\rfloor \cdot q}}} & \text{(5)}\end{matrix}$

The floor function └a┘ extracts the integer part of the given argumenta.

Negative values are mirrored:

y=−Q(|x|)  (6)

The amplitude range of the quantisation step-size q in eq. (1) isstandardised; q has to be transmitted as side information in everyMPEG-2 bit stream. This does not hold for the parameter λ in eq. (1).This parameter is not needed for reconstructing the dct-coefficientsfrom the bit stream, and is therefore not transmitted. However, theλ-value controls the mapping of the original dct-coefficients x onto thegiven set of representation levels

r ₁ =l·q  (7)

According to eq. (1), the (positive) x-axis is partitioned by thedecision levels $\begin{matrix}{{d_{l} = {{{\left( {l - \frac{\lambda}{2}} \right) \cdot q}\quad l} = 1}},2,\ldots} & \text{(8)}\end{matrix}$

Each x ε[d_(l), d_(l+1)) is mapped onto the representation levely=r_(l). As a special case, the interval [0, d₁) is mapped onto y=0.

The parameter λ can be adjusted for each quantisation step-size q,resulting in a distortion rate optimised quantisation: themean-squared-error

D=E[(x−y)²]  (9)

is minimised under a bit rate constraint imposed on the coefficients y.In order to simplify the analysis, the first order source entropy

 H=Σ _(l) −P _(l) , log ₂ P _(l)  (10)

of the coefficients y instead of the MPEG-2 codeword table is taken tocalculate the bit rate. It has been verified in the Paper that theentropy H can be used to derive a reliable estimate for the number ofbits that result from the MPEG-2 codeword table. In Eqn. (10), P_(l)denotes the probability for the occurrence of the coefficient y=r_(l).

The above constrained minimisation problem can be solved by applying theLagrange multiplier method, introducing the Lagrange multiplier μ. Onethen gets the basic equation to calculate the quantisation parameter λ:$\begin{matrix}{{\frac{\partial D}{\partial\lambda} + {\mu \cdot \frac{\partial H}{\partial\lambda}}} = 0} & \text{(11)}\end{matrix}$

Note, that the solution for λ that one obtains from Eqn. (11) depends onthe value of μ. The value of μ is determined by the bit rate constraint.

H≦H ₀  (12)

where H₀ specifies the maximum allowed bit rate for encoding thecoefficients y. In general, the amplitude range of the Lagrangemultiplier is 0<μ<∞. In the special case of H₀→∞, one obtains μ→0.Conversely for H₀→0, one obtains in general μ→∞.

The Laplacian probability density function (pdf) is an appropriate modelfor describing the statistical distribution of the amplitudes of theoriginal dct-coefficients. This model is now applied to evaluateanalytically Eqn. (11). One then obtains a distortion-rate optimisedquantiser characteristic by inserting the resulting value for λ in eq.(5).

Due to the symmetric quantiser characteristic for positive and negativeamplitudes in Eqns. (5) and (6), we introduce a pdf p for describing thedistribution of the absolute original amplitudes |x|. The probability P₀for the occurrence of the coefficient y=0 can then be specified as$\begin{matrix}{P_{0} = {\int_{0}^{{({l - \frac{\lambda}{2}})} \cdot q}{{p(x)}\quad {x}}}} & \text{(13)}\end{matrix}$

Similarly, the probability P_(l) for the coefficient |y| becomes$\begin{matrix}{{P_{l} = {{\int_{{({l - \frac{\lambda}{2}})} \cdot q}^{{({l + 1 - \frac{\lambda}{2}})} \cdot q}{{p(x)}\quad {x}\quad l}} = 1}},2,\ldots} & \text{(14)}\end{matrix}$

With Eqns. (13) and (14), the partial derivative of the entropy H of eq.(10) can be written after a straightforward calculation as$\begin{matrix}{\frac{\partial H}{\partial\lambda} = {\frac{q}{2} \cdot {\sum\limits_{l \geq 0}^{\quad}\quad {{{p\left( {\left( {l + 1 - \frac{\lambda}{2}} \right) \cdot q} \right)} \cdot \log_{2}}\frac{P_{l}}{P_{l + 1}}}}}} & \text{(15)}\end{matrix}$

From eq. (9) one can first deduce $\begin{matrix}{{D = {{\int_{0}^{{({l - \frac{\lambda}{2}})} \cdot q}{{x^{2} \cdot {p(x)}}\quad {x}}} + {\sum\limits_{l \geq 1}^{\quad}\quad {\int_{{({l - \frac{\lambda}{2}})} \cdot q}^{{({l + 1 - \frac{\lambda}{2}})} \cdot q}{{\left( {x - {l \cdot q}} \right)^{2}\quad \cdot {p(x)}}{x}}}}}}\quad} & \text{(16)}\end{matrix}$

and further from eq. (16) $\begin{matrix}{\frac{\partial D}{\partial\lambda} = {\frac{- q^{3}}{2} \cdot \left( {I - \lambda} \right) \cdot {\sum\limits_{l \geq 0}^{\quad}\quad {p\left( {\left( {l + 1 - \frac{\lambda}{2}} \right) \cdot q} \right)}}}} & \text{(17)}\end{matrix}$

It can be seen from eq. (17) that $\begin{matrix}{\frac{\partial D}{\partial\lambda} \geq {0\quad {if}\quad 0} \leq \lambda \leq 1} & \text{(18)}\end{matrix}$

Thus, when λ is increased from zero to one, the resulting distortion Dis monotonically decreasing until the minimum value is reached for λ=1.The latter is the solution to the unconstrained minimisation of themean-squared-error, however, the resulting entropy H will in general notfulfil the bit rate constraint of eq. (12).

Under the assumption of P_(l)≧P_(l+1) in eq. (15), we see that ∂H/∂λ≧0.Thus, there is a monotonic behaviour: when λ is increased from zero toone, the resulting distortion D monotonically decreases, at the sametime the resulting entropy H monotonically increases. Immediately, aniterative algorithm can be derived from this monotonic behaviour. Theparameter λ is initially set to λ=1, and the resulting entropy H iscomputed. If H is larger than the target bit rate H₀, the value of λ isdecreased in further iteration steps until the bit rate constraint, eq.(12) is fulfilled. While this iterative procedure forms the basis of asimplified distortion-rate method proposed for transcoding of I-frames,we continue to derive an analytical solution for λ.

Eqns. (15) and (17) can be evaluated for the Laplacian model:$\begin{matrix}{{p(x)} = {{{{\beta \cdot \alpha \cdot ^{{- \alpha}\quad x}}\quad {if}\quad x} \geq d_{1}} = {\left( {1 - \frac{\lambda}{2}} \right) \cdot q}}} & \text{(19)}\end{matrix}$

After inserting the model pdf of Eqn. (19) in Eqns. (15) and (17), itcan be shown that the basic equation (11) leads then to the analyticalsolution for λ, $\begin{matrix}{\lambda = {1 - {\frac{\mu}{q^{2}} \cdot \left\lbrack {{h(z)} + {\left( {1 - z} \right) \cdot {\log_{2}\left( \frac{P_{0}}{1 - P_{0}} \right)}}} \right\rbrack}}} & \text{(20)}\end{matrix}$

with z=e^(−a-q) and the ‘z’-entropy

 h(z)=−z.log ₂ z−(1−z). log ₂(1−z)  (21)

Eqn. (20) provides only an implicit solution for λ, as the probabilityP₀ on the right hand side depends on λ according to eq. (13). Ingeneral, the value of P₀ can be determined only for known λ by applyingthe quantiser characteristic of Eqns. (5) and (6) and counting therelative frequency of the event y=0. However, eq. (20) is a fixed-pointequation for λ which becomes more obvious if the right hand side isdescribed by the function $\begin{matrix}{{g(\lambda)} = {1 - {\frac{\mu}{q^{2}} \cdot \left\lbrack {{h(z)} + {\left( {1 - z} \right) \cdot {\log_{2}\left( \frac{P_{0}}{1 - P_{0}} \right)}}} \right\rbrack}}} & \text{(22)}\end{matrix}$

resulting in the classical fixed-point form λ=g(λ). Thus, it followsfrom the fixed point theorem of Stefan Banach that the solution for λcan be found by an iterative procedure with

λ_(j+1) =g(λ_(j))  (23)

in the (j+1)-th iteration step. The iteration of (23) converges towardsthe solution for an arbitrary initial value λ₀ if the function g is‘self-contracting’, i.e. Lipschitz-continuous with a Lipschitz-constantsmaller than one. As an application of the mean theorem for thedifferential calculus, it is not difficult to prove that g is always‘self-contracting’ if the absolute value of the partial derivative isless than one. This yields the convergence condition $\begin{matrix}{{1 > {\frac{\partial g}{\partial\lambda}}} = {\frac{1}{2 \cdot {\ln (2)}} \cdot \frac{\mu}{q} \cdot \left( {1 - z} \right) \cdot \frac{\alpha}{P_{0}}}} & \text{(24)}\end{matrix}$

A distortion-rate optimised quantisation method will now be derivedbased on the results obtained above. As an example, a technique isoutlined for quantising the AC-coefficients of MPEG-2 intra frames. Itis straightforward to modify this technique for quantising thedct-coefficients of MPEG-2 inter frames, i.e. P- and B-frames.

Firstly, one has to take into account that the 63 AC-coefficients of an8×8 dct-block do not share the same distribution. Thus, an individualLaplacian model pdf according to eq. (19) with parameter α_(i) isassigned to each AC-frequency index i. This results in an individualquantiser characteristic according to Eqns. (5) and (6) with parameterλ_(i). Furthermore, the quantisation step-size q_(i) depends on thevisual weight w_(i) and a frequency-independent qscale parameter as$\begin{matrix}{q_{i} = \frac{w_{i} \cdot {qscale}}{16}} & \text{(25)}\end{matrix}$

For a given step-size q_(i), the quantisation results in a distortionD_(i)(λ_(i)) and a bit rate H_(i)(λ_(i)) for the AC-coefficients of thesame frequency index i. As the dct is an orthogonal transform, and asthe distortion is measured by the mean-squared-error, the resultingdistortion D in the spatial (sample/pixel) domain can be written as$\begin{matrix}{D = {c \cdot {\sum\limits_{i}^{\quad}\quad {D_{i}\left( \lambda_{i} \right)}}}} & \text{(26)}\end{matrix}$

with some positive normalising constant c. Alternatively the distortioncan measured in the weighted coefficient domain in order to compensatefor the variation in the human visual response at different spatialfrequencies.

Similarly, the total bit rate H becomes $\begin{matrix}{H = {\sum\limits_{i}^{\quad}\quad {H_{i}\left( \lambda_{i} \right)}}} & \text{(27)}\end{matrix}$

For a distortion rate optimised quantisation, the 63 parameters λ_(l)have to be adjusted such that the cost function

D+μ.H  (28)

is minimised. The non-negative Lagrange multiplier μ is determined bythe bit rate constraint

H≦H ₀  (29)

Alternatively, if the distortion is expressed in the logarithmic domainas:

D′=20 log₁₀ D dB  (28a)

The cost function to be minimised becomes:

B=D+λ′H  (28b)

Where μ′ is now an a priori constant linking distortion to bit rate.

A theoretical argument based on coding white noise gives a law of 6 dBper bit per coefficient. In practice, observation of actual codingresults at different bit rates gives a law of k dB per bit, where ktakes values from about 5 to about 8 depending on the overall bit rate.In practice, the intuitive ‘6 dB’ law corresponds well with observation.

Additionally, the qscale parameter can be changed to meet the bit rateconstraint of Eqn. (25). In principle, the visual weights w_(i) offeranother degree of freedom but for simplicity we assume a fixed weightingmatrix as in the MPEG-2 reference decoder. This results in the followingdistortion rate optimised quantisation technique which can be stated ina ‘C’-language-like form:

/* Begin of quantising the AC-coefficients in MPEG-2 intra frames*/D_(min) = ∞; for (qscale = qmin; qscale ≦ qmax; qscale = qscale + 2) /*linear qscale table*/ { μ = 0; do { Step 1: determine λ₁, λ₂, . . . ,λ₆₃ by minimising D + μ · H; Step 2: calculate H = Σ H_(i)(λ_(i)); μ =μ + δ; /*δ to be selected appropriately*/ }while (H > H_(o)); Step 3:calculate D = c · Σ D_(i) (λ_(i)); if (D < D_(min)){ qscale_(opt) =qscale; for (i = 1; i ≦ 63; i = i + 1)λ_(i,opt) = λ_(i); D_(min) = D;} }for (i = 1; i ≦ 63; i = i + 1)$q_{i,{opt}} = \frac{w_{i} \cdot {qscale}_{opt}}{16}$

{$y = {{Q_{i}(x)} = {\left\lfloor {\frac{x}{q_{i,{opt}}} + \frac{\lambda_{i,{opt}}}{2}} \right\rfloor \cdot q_{i,{opt}} \cdot {{sgn}(x)}}}$

quantise all AC-coefficients of frequency-index i by } /*End ofquantising the AC-coefficients in MPEG-2 intra frames*/

There are several options for performing Step 1-Step 3:

1. Options for performing Step 1

The parameters λ₁, λ₂, . . . λ₆₃ can be determined

a) analytically by applying Eqns. (20)-(23) of Section 3.

b) iteratively by dynamic programming of D+μ·H, where either of theoptions described in the next points can be used to calculate D and H.

2. Options for performing Step 2

H=ΣH_(i)(λ₁) can be calculated

a) by applying the Laplacian model pdf, resulting in $\begin{matrix}{H = {{\sum\limits_{i}^{\quad}\quad {h\left( P_{0,i} \right)}} + {\left( {1 - P_{0,i}} \right) \cdot {\frac{h\left( z_{i} \right)}{1 - z_{i}}.}}}} & \text{(32)}\end{matrix}$

 where h(P_(0,i)) and h(Z_(i)) are the entropies as defined in eq. (21)of P_(Oi) (eq. (13)) and Z_(l)=e^(−α) _(l) ^(.q) _(i), respectively.Note that P_(0,i) in Eqn. (32) can be determined by counting for eachdct-frequency index i the relative frequency of the zero-amplitudey=Q_(i)(x)=0. Interestingly, eq. (32) shows that the impact of thequantisation parameters λ_(i) on the resulting bit rate H only consistsin controlling the zero-amplitude probabilities P_(0,i).

b) from a histogram of the original dct-coefficients, resulting withEqns. (10), (13) and (14) in $\begin{matrix}{H = {- {\sum\limits_{i}^{\quad}\quad {\sum\limits_{l}^{\quad}\quad {{P_{l,i} \cdot \log_{2}}P_{l,i}}}}}} & \text{(33)}\end{matrix}$

c) by applying the MPEG-2 codeword table

3. Options for performing Step 3

D=c·ΣD_(i) (λ_(i)) can be calculated

a) by applying the Laplacian model pdf of Eqn. (19) and evaluating Eqn.(16).

b) by calculating D=E[(x−y)²] directly from a histogram of the originaldct-coefficients x.

Depending on which options are chosen for Step 1-Step 3, the proposedmethod results in a single pass encoding scheme if the Laplacian modelpdf is chosen or in a multi pass scheme if the MPEG-2 codeword table ischosen. Furthermore, the method can be applied on a frame, macroblock oron a 8×8-block basis, and the options can be chosen appropriately. Thelatter is of particular interest for any rate control scheme that setsthe target bit rate H₀ either locally on a macroblock basis or globallyon a frame basis.

Furthermore, we note that the proposed method skips automaticallyhigh-frequency dct-coefficients if this is the best option in therate-distortion sense. This is indicated if the final quantisationparameter λ_(i,opt) has a value close to one for low-frequency indices ibut a small value, e.g. zero, for high-frequency indices.

A distortion-rate optimised quantisation method for MPEG-2 compatiblecoding has been described, with several options for an implementation.The invention can immediately be applied to standalone (firstgeneration) coding. In particular, the results help designing asophisticated rate control scheme.

The quantiser characteristic of eqs. (5) and (6) can be generalised to$\begin{matrix}{y = {{Q(x)} = {{r(x)} + {\left\lfloor {\frac{x - {r(x)}}{q(x)} + \frac{\lambda (x)}{2}} \right\rfloor \cdot {q(x)}}}}} & \text{(34)}\end{matrix}$

for non-negative amplitudes x. The floor-function └a┘ in eq. (34)returns the integer part of the argument a. Negative amplitudes aremirrored,

y=−Q(|x|)  (35)

The generalisation is reflected by the amplitude dependent values λ(x),q(x), r(x) in eq. (34). For a given set of representation levels . . .<r_(l-1)<r_(l)<r_(l+1)< . . . and a given amplitude x, the pair ofconsecutive representation levels is selected that fulfils

r _(l-1) ≦x<r ₁  (36)

The value of the local representation level is then set to

r(x)=r_(l-1)  (37)

The value of the local quantisation step-size results from

q(x)=q ₁ =r _(l) −r _(l-1)  (38)

A straightforward extension of the rate-distortion concept detailedabove yields for the local lambda parameter, very similar to eq. (20).$\begin{matrix}{{{\lambda (x)} = {\lambda_{l} = {1 - {\frac{\mu}{q_{l}^{2}} \cdot {\log_{2}\left( \frac{P_{l - 1}}{P_{l}} \right)}}}}},} & \text{(39)}\end{matrix}$

(l=1, . . . , L)

Similar to eqs. (13), (14), the probabilities in eq. (39) depend on thelambda parameters, $\begin{matrix}{P_{0} = {\int_{0}^{r_{l} - {\frac{\lambda_{l}}{2} \cdot q_{l}}}{{p(x)}\quad {x}\quad {and}}}} & \text{(40)} \\{P_{l} = {{\int_{r_{l} - {\frac{\lambda_{l}}{2} \cdot q_{l}}}^{r_{l - 1} - {\frac{\lambda_{l - 1}}{2} \cdot q_{l - 1}}}{{p(x)}\quad {x}\quad l}} \geq 1}} & \text{(41)}\end{matrix}$

Therefore, eq. (39) represents a system of non-linear equations fordetermining the lambda parameters λ₁, . . . , λ_(L). In general, thissystem can only be solved numerically.

However, eq. (39) can be simplified if the term log₂(P_(l-1)/P_(l)) isinterpreted as the difference $\begin{matrix}{{I_{l} - I_{l - 1}} = {\log_{2}\left( \frac{P_{l - 1}}{P_{l}} \right)}} & \text{(42)}\end{matrix}$

of optimum codeword lengths

I _(l) =−log ₂ P _(l) I _(l-1) =−log ₂ P _(l-1)  (43)

associated with the representation levels r_(l), r_(l-1).

A practical implementation of the above will now be described.

Once the probability distribution, parametric or actual, of theunquantized coefficients is known, it is possible to choose a set ofquantizer decision levels that will minimise the cost function B,because both the entropy H and the distortion D are known as functionsof the decision levels for a given probability distribution. Thisminimization can be performed off-line and the calculated sets ofdecision levels stored for each of a set of probability distributions.

In general, it will be seen that the optimum value of λ corresponding toeach decision level is different for different coefficient amplitudes.In practice, it appears that the greatest variation in the optimum valueof λ with amplitude is apparent between the innermost quantizer level(the one whose reconstruction level is 0) and all the other levels. Thismeans that it may be sufficient in some cases to calculate, for eachcoefficient index and for each value (suitably quantized) of theprobability distribution parameter, two values of λ, one for theinnermost quantizer level and one for all the others.

A practical approach following the above description is shown in FIG. 5.

The DCT coefficients are taken to a linear quantizer 52 providing theinput to a histogram building unit 54. The histogram is thus based onlinearly quantized versions of the input DCT coefficients. The levelspacing of that linear quantizer 52 is not critical but should probablybe about the same as the average value of q. The extent of the histogramfunction required depends on the complexity of the parametricrepresentation of the pdf; in the case of a Laplacian or Gaussiandistribution it may be sufficient to calculate the mean or variance ofthe coefficients, while in the ‘zero excluded’ Laplacian used in thePaper it is sufficient to calculate the mean and the proportion of zerovalues. This histogram, which may be built up over a picture period orlonger, is used in block 56 as the basis of an estimate of the pdfparameter or parameters, providing one of the inputs to the calculationof λ in block 58.

Another input to the calculation of λ is from a set of comparators 60which are in effect a coarse quantizer, determining in which range ofvalues the coefficient to be quantized falls. In the most likely casedescribed above, it is sufficient to compare the value with theinnermost non-zero reconstruction level. The final input required tocalculate λ is the quantizer scale.

In general, an analytical equation for λ cannot be obtained. Instead, aset of values can be calculated numerically for various combinations ofpdf parameters, comparator outputs and quantizer scale values, and theresults stored in a lookup table. Such a table need not be very large(it may, for example, contain fewer than 1000 values) because the optimaare not very sharp.

The value of λ calculated is then divided by 2 and added in adder 62 tothe coefficient prior to the final truncation operation in block 64.

Instead of using variable codeword lengths that depend on the currentprobabilities according to eq. (43), a fixed table of variable codewordlengths C₀, . . . , C_(L) can be applied to simplify the process. Thevalues of C₀, . . . , C_(L) can be determined in advance by designing asingle variable length code, ie. a Huffman code, for a set of trainingsignals and bit rates. In principle, they can also be obtained directlyfrom the MPEG2 variable-length code table. The only complication is thefact that MPEG2 variable-length coding is based on combinations of runsof zero coefficients terminated by non-zero coefficients.

One solution to this problem is to estimate ‘equivalent codewordlengths’ from the MPEG2 VLC tables. This can be done quite easily if onemakes the assumption that the probability distributions of the DCTcoefficients are independent of each other. Another possibility is toconsider the recent past history of quantization within the current DCTblock to estimate the likely effect of each of the two possiblequantization levels on the overall coding cost.

Then, eq. (39) changes to $\begin{matrix}{{{\lambda (x)} = {\lambda_{l} = {1 - {\frac{\mu}{q_{l}^{2}}\left( {C_{l} - C_{l - 1}} \right)}}}},\left( {{l = 1},\ldots \quad,L} \right)} & \text{(44)}\end{matrix}$

The resulting distortion-rate optimised quantisation algorithm isessentially the same as detailed previously except that the lambdaparameters are calculated either from eq. (39) or eq. (44) for each pairof horizontal and vertical frequency indices.

A simplified method of calculating λ(x) will now be described, whereonly the local distortion is considered for each coefficient.

Here, we make use of the fact that the variable-length code (VLC) tableused for a given picture in MPEG2 is fixed and known. This shouldsimplify and make more accurate the calculations of the trade-offbetween bit rate and distortion. In particular, the calculations can bemade on a coefficient basis since the effect on the bit rate of theoptions for quantizing a particular coefficient is immediately known.The same is true (although a little more difficult to justify) of theeffect on the quantizing distortion.

If we accept the assumptions implied in the above paragraph, then we canvery simply calculate the value of the decision level to minimize thelocal contribution to the cost function B. This will in fact be thelevel at which the reduction in the bit count obtained by quantizing tothe lower reconstruction level (rather than the higher level) is offsetexactly by the corresponding increase in quantizing distortion.

If the two reconstruction levels being considered have indices i andi+1, the corresponding codewords have lengths L_(i), and L_(i+1), andthe quantizer scale is q, then:

(i) the reduction in bit count is L_(i+1)−L_(i).

(ii) the local increase in distortion is 20 log₁₀q(1−λ/2)−20 log₁₀qλ/2.

Combining these using the law linking distortion to bit rate, we have

6(L _(i+1) −L _(i))=20 log ₁₀(2/λ−1)  (45)

or, more simply

L _(i+1) −L _(i) =log ₂(2/λ−1)  (46)

leading to

λ=2/(1+2^((L) ^(_(i+1)) ^(−L) ^(_(i)) ⁾)tm (47)

This elegant result shows that the value of λ depends here only on thedifference in bit count between the higher and lower quantizerreconstruction levels.

The fact that the level of λ is now independent both of the coefficientprobability distribution and the quantizer scale leads to the following,much simplified implementation shown in FIG. 6.

Here, the DCT coefficients are passed to the side-chain truncate block70 before serving as the address in a coding cost lookup table 72. Thevalue of lambda/2 is provided to adder 76 by block 74 and the output istruncated in truncate block 78.

There have been described a considerable number of ways in which thepresent invention may be employed to improve quantisation in a coder;still others will be evident to the skilled reader. It should beunderstood that the invention is also applicable to transcoding andswitching.

The question will now be addressed of a two stage-quantiser. Thisproblem is addressed in detail in the Paper which sets out the theory ofso-called maximum a-posteriori (MAP) and the mean squared error (MSE)quantisers. By way of further exemplification there will now bedescribed an implementation of the MAP and MSE quantiser for transcodingof MPEG2 [MPEG2] intra AC-coefficients that result from an 8×8 discretecosine transform (dct).

The class of the first generation quantisers y₁=Q₁(x) specified by theseequations is spanned by the quantisation step-size q₁ and the parameterλ₁; such a quantiser is called (q₁, λ₁)-type quantiser.

In the transcoder, the first generation coefficients y₁ are mapped ontothe second generation coefficients y₂=Q₂(y₁) to further reduce the bitrate. Under the assumption of a (q₁, λ₁)-type quantiser in the firstgeneration, e.g. MPEG2 reference coder TM5, it follows from the resultsset out in the Paper that the MAP quantiser Q_(2,map) and the MSEquantiser Q_(2,mse) can be implemented as a (q₂, λ_(2,map))-type and a(q₂, λ_(2,mse))-type quantiser, respectively. For both, the MAP and theMSE quantiser, the second generation step-size q₂ is calculated from thesecond generation parameters w₂ and qscale₂. However, there aredifferent equations for calculating λ_(2,map) and λ_(2,mse).

With the results of the Paper, it follows that λ_(2,map) can becalculated as $\begin{matrix}{\lambda_{2,{map}} = {\lambda_{2,{ref}} + {\left( {\mu_{map} - \lambda_{1}} \right) \cdot \frac{q_{1}}{q_{2}}}}} & \text{(48)}\end{matrix}$

and λ_(2,mse) as $\begin{matrix}{\lambda_{2,{mse}} = {1 + {\left( {\mu_{mse} - \lambda_{1}} \right) \cdot \frac{q_{1}}{q_{2}}}}} & \text{(49)}\end{matrix}$

The parameter λ_(2,ref) can be changed in the range 0≦λ_(2,ref)≦1 foradjusting the bit rate and the resulting signal-to-noise-ratio. Thisgives an additional degree for freedom for the MAP quantiser comparedwith the MSE quantiser. The value of λ_(2,ref)=0.9 is particularlypreferred. The parameter μ_(map) and the parameter μ_(mse) arecalculated from the first generation quantisation step-size q₁ and az-value, $\begin{matrix}{\mu_{map} = {{- \frac{2}{\ln \left( z^{q_{1}} \right)}} \cdot {\ln \left( \frac{2}{1 + z^{q_{1}}} \right)}}} & \text{(50)} \\{\mu_{mse} = {{- \frac{2}{\ln \left( z^{q_{1}} \right)}} \cdot \frac{1 - {\left( {1 - {\ln \left( z^{q_{1}} \right)}} \right) \cdot z^{q_{1}}}}{1 - z^{q_{1}}}}} & \text{(51)}\end{matrix}$

The amplitude range of the values that result from these equations canbe limited to the range 0≦μ_(map), μ_(mse)≦2. Similarly, the amplituderange of the resulting values can be limited to 0≦λ_(2,map),λ_(2,mse)≦2.

The z-value has a normalised amplitude range, ie. 0≦z≦1, and can becalculated either from the first generation dct-coefficients y₁ or fromthe original dct-coefficients x as described in the Paper. In the lattercase, the z-value is transmitted as additional side information, e.g.user data, along with the first generation bit stream so that noadditional calculation of z is required in the transcoder.Alternatively, a default z-value may be used. An individual z-value isassigned to each pair of horizontal and vertical frequency indices. Thisresults in 63 different z-values for the AC-coefficients of an 8×8 dct.As a consequence of the frequency dependent z-values, the parametersλ_(2,map) and λ_(2,mse) are also frequency dependent, resulting in 63(q₂, λ_(2,map))-type quantisers and 63 (q₂, λ_(2,mse))-type quantisers,respectively. Additionally, there are different parameter sets for theluminance and the chrominance components. The default z-values for theluminance and chrominance components are shown in Table 1 and Table 2respectively.

TABLE 1 Normalised z-values, eg. 256 × z, for luminance (default)

TABLE 2 Normalised z-values, ie. 256 × z, for chrominance (default)

For a description of preferred techniques for making available tosubsequent coding and decoding processes, information relating toearlier coding and decoding processes, reference is directed to EP-A-0765 576; EP-A-0 807 356 and WO-A-9803017.

It is further shown that the first order source entropy of the secondgeneration coefficients can be used to derive an estimate of the bitrate that results from the MPEG-2 intra vlc codeword table. This wouldsimplify the computation of the bit rate if the transcoder had to decideupon either the TM5, the mse or the map cost function based on the bestrate-distortion performance. The resulting PSNR values can be comparedin the transcoder on the basis of the Laplacian model pdf. This could besimplified for the map cost function due to the monotonic behaviour ofthe rate-distortion performance, e.g. after setting a target bit rate ona frame or block basis, the parameter of the reference quantiser can beincreased until the first order source entropy exceeds the target bitrate. The investigation of an ‘easy-to-implement’ algorithm based on theabove rate-distortion considerations is a promising goal of future work.Furthermore, the presented results can be adapted for transcoding ofMPEG-2 inter-frames, i.e. P- and B-frames, involving motion compensatingprediction. However, the problem of drift [OW-94] [OW-96] between thepredictors of the encoder and the decoder has then additionally to betaken into account.

What is claimed is:
 1. A compression transcoder for changing the bitrate of a compressed digital signal which has been compression encodedin a first generation quantiser Q₁, having a quantisation step size q₁and a parameter λ₁ which controls the bounds of decision levels to bemapped onto representation levels of the quantiser, which quantiseroperating on a set of transform coefficients x_(k) representative ofrespective frequency indices f_(k) to produce an amplitude y₁=Q₁(x), thecompression transcoder comprising a second generation quantiser Q₂,having a quantisation step size q₂ and a parameter λ₂ which controls thebounds of decision levels to be mapped onto representation levels of thequantiser, in which λ₂ is dynamically controlled in dependence uponvalues from the previous encoding of the digital signal, and in whichthe second generation λ₂ value is controlled as a function: λ₂=λ₂(f_(k),q₁, λ₁, q₂, y₁).
 2. A compression transcoder according to claim 1 inwhich the second generation λ₂ value is controlled as a function:λ₂=λ₂(f_(k), q₁, λ₁, q₂, y₁, λ_(2,ref)) and in which the parameterλ_(2,ref) represents a notional reference (q₂, λ_(2,ref))-type quantiserwhich bypasses the first generation coding and directly maps an originalamplitude x onto a second generation reference amplitudey_(2,ref)=Q_(2,ref)(x).
 3. A compression transcoder according to claim 2in which the parameter λ_(2,ref) is selected empirically.
 4. Acompression transcoder according to claim 2 in which the parameterλ_(2,ref) is fixed for each frequency.